-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default labels to use Kubernetes resource name #698
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do more testing on this in several respects;
- Does an upgrade from the previous version to this one work correctly when an override name is in place?
- Does this work when incorporated into downstream projects (e.g. k8ssandra-operator)? If not, should we make changes to the downstream projects, ensure everything works, and only then merge the changes?
Also @burmanm can you give me an overview of how the backwards compatibility is supposed to work? I don't think this was on the original ticket and I'm curious as to whether that status field you're using is pre-existing, and where it comes from generally.
@@ -596,7 +599,7 @@ func (dc *CassandraDatacenter) SetCondition(condition DatacenterCondition) { | |||
// GetDatacenterLabels ... | |||
func (dc *CassandraDatacenter) GetDatacenterLabels() map[string]string { | |||
labels := dc.GetClusterLabels() | |||
labels[DatacenterLabel] = CleanLabelValue(dc.DatacenterName()) | |||
labels[DatacenterLabel] = CleanLabelValue(dc.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is CleanLabelValue
still necessary now that we're using the Kubernetes meta.name which will be DNS compliant anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not really, but they do have a bit different rules:
dns1123SubdomainFmt string = "[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*"
dns1123LabelFmt string = "(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?"
So better safe than sorry I guess, especially if Kubernetes would one day update these..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think the dns1123LabelFmt is more liberal than the dns1123SubdomainFmt, I also don't think k8s will ever change these (since they're linked to the DNS spec).
I'd remove the extra function call just so it is clear to readers that the label is always going to be the exact meta.Name, and not something else that we've interfered with.
This is just a suggestion though, so it isn't blocking.
Yes, the operator upgrade test will deploy with override name in place (with an operator version that uses the override naming) and update from there. The status field is not pre-existing, yet it wouldn't matter if it did or did not. A CRD upgrade will add it and the default is treated as 0 (such as in the cases that the field was not previously populated or does not exists). If the CRD is old one, it would simply fetch too much information on every reconcile but not break anything. I did test this manually against existing cluster. If you wish to test that, the simplest way is to deploy cass-operator with Helm and do an upgrade. For example:
Deploy something, like the one used in operator upgrade:
And then update cass-operator to a version built from this repo:
And the cluster should stay healthy and get the updated values. This process does not update the CRD, only the operator image. As for downstream projects, it's not really a place here to think about that (if the downstream is unable to use upstream, then the fix must happen in the downstream). That said, k8ssandra-operator uses these methods directly so that would work as soon as the first reconcile has finished here. |
In theory that's true, but if we merge this to On the other hand, changes to cass-operator can always break downstream repos and that's why we do integration tests in those downstreams before release. So I'm not necessarily going to block this PR on that basis, if you really want to merge now. Just saying that we may need to do quite a lot of work in the downstreams before we can then upgrade cass-operator.
This is very helpful, thank you! That'll help me get this tested much faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a doc string in the types you need to change that currently reads:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: There's a doc string in the types you need to change that currently reads:
Suggestion: It would be good to have some webhook protection against excessively long dc and cluster names to avoid this kind of problem:
2024-09-12T07:45:13.712Z ERROR Could not create headless service {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"correct-dc-name","namespace":"default"}, "namespace": "default", "name": "correct-dc-name", "reconcileID": "3e8bfbde-c188-408b-8244-0a74c55edb56", "namespace": "default", "datacenterName": "dcnamespecialcharacters", "clusterName": "cluster1&&OveriddenName", "error": "Service \"cluster1overiddenname-dcnamespecialcharacters-additional-seed-service\" is invalid: metadata.name: Invalid value: \"cluster1overiddenname-dcnamespecialcharacters-additional-seed-service\": must be no more than 63 characters"}
Checks:
Master Branch
Pods getcassandra.datastax.com/datacenter: dcNmCharacters
Services get cassandra.datastax.com/datacenter: dcNmCharacters
PVCs get cassandra.datastax.com/datacenter: dcNmCharacters
sts gets cassandra.datastax.com/datacenter: dcNmCharacters
pvs aren't labelled at all.
ConfigMaps - it doesn't look like there are any.
In the status I see datacenterName: dcNm!&^^ Characters
One odd thing with secrets, it looked like I ended up with two: cl1oridename-superuser cluster1-superuser
. One had cassandra.datastax.com/datacenter: My_Super_Dc
and the other had cassandra.datastax.com/datacenter: dcNmCharacters
. I think the former might not have been cleaned up. Suggestion: investigate if this still happens after this PR, since it is possible that the cleanup failures are related to the buggy labelling behaviour that we are fixing here.
After upgrade
Pods get cassandra.datastax.com/datacenter: dcNmCharacters
Services get cassandra.datastax.com/datacenter: dcNmCharacters
I'm gonna stop here, it doesn't look like things have been upgraded...
In the status I see datacenterName: dcNm!&^^ Characters
Checking the logs, I get
kubectl logs -n cass-operator k8ssandra-cass-operator-84ff466f58-cxlxq
Error from server (BadRequest): container "cass-operator" in pod "k8ssandra-cass-operator-84ff466f58-cxlxq" is waiting to start: trying and failing to pull image
I assume this is because the image isn't published anywhere? Do I need to build it myself and then push it somewhere to get this tested @burmanm ?
Right, it should be available in michaelburman290/cass-operator with the tag I mentioned. Your comments around doc string don't show up at all, no idea what happened here. |
…not datacenter override name. Create new Status field for MetadataVersion and if we keep running old ones (or old CRD), we simply fetch also with the older possible name
@Miles-Garnsey ping |
Going again... Checks: Pods get cassandra.datastax.com/datacenter: DCn In the status I see datacenterName: DC&n_!! After upgrade Pods get cassandra.datastax.com/datacenter: test-dc Otherwise this looks about right to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this PR does:
Changes datacenter label value to be Kubernetes resource name for Datacenter again. Provides backwards compatibility with older versions using the wrong one.
All the label names are updated after a successful reconcile (this is an existing feature). Tested with Helm installation of 1.22.1 and then updating the cass-operator image only.
Which issue(s) this PR fixes:
Fixes #691
Fixes #699
Checklist